Automating Construction of Machine Learning Models with Clinical Big Data: Rationale and Methods

نویسندگان

  • Gang Luo
  • Bryan L Stone
  • Michael D Johnson
  • Peter Tarczy-Hornoch
  • Adam B Wilcox
  • Sean D Mooney
  • Xiaoming Sheng
  • Peter J Haug
  • Mario Capecchi
چکیده

Background: To improve health outcomes and cut healthcare costs, we often need to conduct prediction/classification using large clinical data sets, a.k.a. “clinical big data,” e.g., to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, healthcare researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Healthcare researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a U.S. shortage of data scientists and hiring competition from companies with deep pockets, healthcare systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select: a) hyper-parameter values and complex algorithms that greatly affect model accuracy, as well as b) operators and periods for temporally aggregating clinical attributes (e.g., whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. Objective: This study’s goal is to enable healthcare researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. Methods: This study will: 1) finish developing new software Auto-ML (Automated Machine Learning) to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance, 2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers, and 3) perform simulations to estimate the impact of adopting Auto-ML on U.S. patient outcomes. Results: We are currently writing Auto-ML’s design document. We intend to finish our study in around five years. Conclusions: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, healthcare researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in healthcare and improve patient outcomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods

BACKGROUND To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many c...

متن کامل

Thermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning

Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...

متن کامل

PredicT-ML: a tool for automating machine learning model building with big clinical data

BACKGROUND Predictive modeling is fundamental to transforming large clinical data sets, or "big clinical data," into actionable knowledge for various healthcare applications. Machine learning is a major predictive modeling approach, but two barriers make its use in healthcare challenging. First, a machine learning tool user must choose an algorithm and assign one or more model parameters called...

متن کامل

Thermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning

Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...

متن کامل

Automated Machine Learning on Big Data using Stochastic Algorithm Tuning

We introduce a means of automating machine learning (ML) for big data tasks, by performing scalable stochastic Bayesian optimisation of ML algorithm parameters and hyper-parameters. More often than not, the critical tuning of ML algorithm parameters has relied on domain expertise from experts, along with laborious handtuning, brute search or lengthy sampling runs. Against this background, Bayes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017